Efficient Algorithms for Masking and Finding Quasi-Identifiers
نویسندگان
چکیده
A quasi-identifier refers to a subset of attributes that can uniquely identify most tuples in a table. Incautious publication of quasi-identifiers will lead to privacy leakage. In this paper we consider the problems of finding and masking quasi-identifiers. Both problems are provably hard with severe time and space requirements. We focus on designing efficient approximation algorithms for large data sets. We first propose two natural measures for quantifying quasi-identifiers: distinct ratio and separation ratio. We develop efficient algorithms that find small quasi-identifiers with provable size and separation/distinct ratio guarantees, with space and time requirements sublinear in the number of tuples. We also propose efficient algorithms for masking quasi-identifiers, where we use a random sampling technique to greatly reduce the space and time requirements, without much sacrifice in the quality of the results. Our algorithms for masking and finding quasi-identifiers naturally apply to stream databases. Extensive experimental results on real world data sets confirm efficiency and accuracy of our algorithms.
منابع مشابه
Efficient Algorithms for Just-In-Time Scheduling on a Batch Processing Machine
Just-in-time scheduling problem on a single batch processing machine is investigated in this research. Batch processing machines can process more than one job simultaneously and are widely used in semi-conductor industries. Due to the requirements of just-in-time strategy, minimization of total earliness and tardiness penalties is considered as the criterion. It is an acceptable criterion for b...
متن کاملFinding the Shortest Hamiltonian Path for Iranian Cities Using Hybrid Simulated Annealing and Ant Colony Optimization Algorithms
The traveling salesman problem is a well-known and important combinatorial optimization problem. The goal of this problem is to find the shortest Hamiltonian path that visits each city in a given list exactly once and then returns to the starting city. In this paper, for the first time, the shortest Hamiltonian path is achieved for 1071 Iranian cities. For solving this large-scale problem, tw...
متن کاملP-Sensitive K-Anonymity with Generalization Constraints
Numerous privacy models based on the k‐anonymity property and extending the k‐anonymity model have been introduced in the last few years in data privacy re‐ search: l‐diversity, p‐sensitive k‐anonymity, (α, k) – anonymity, t‐closeness, etc. While differing in their methods and quality of their results, they all focus first on masking the data, and then protecting the quality of the data as a wh...
متن کاملI - 138 : Protecting Identifiers in Cross - Domain Environments
Unique identification of objects and their associated data representations have received significant attention in the past 10 years. Developing an efficient identifier allocation and tracking scheme that transparently spans security domains requires finesse. It is not uncommon for information to be created in a lower security domain and copied to a higher domain. The rigor by which the data is ...
متن کاملThreshold Implementation as a Countermeasure against Power Analysis Attacks
One of the usual ways to find sensitive data or secret parameters of cryptographic devices is to use their physical leakages. Power analysis is one of the attacks which lay in such a model. In comparison with other types of side-channels, power analysis is so efficient and has a high success rate. So it is important to provide a countermeasure against it. Different types of countermeasures use ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007